22%
14.08.2020
Most storage devices have SMART capability, but can it help you predict failure? We look at ways to take advantage of this built-in monitoring technology with the smartctl utility from the Linux ...
S.M.A.R.T. (Self-Monitoring, Analysis, and Reporting Technology) is a monitoring system for storage devices that provides information about the status of a device and allows for the running of self ...
Most storage devices have SMART capability, but can it help you predict failure? We look at ways to take advantage of this built-in monitoring technology with the smartctl utility from the Linux
13%
10.06.2020
Top Tools
The simple monitoring tool top
is often used to monitor individual systems and can be used for debugging. Because it is such a valuable and highly used tool, similar tools have been
13%
19.02.2020
to be on the system. If you want to build or run containers, you need to be part of that group. Adding someone to an existing group is not difficult:
$ sudo usermod -a -G docker layton
Chris Hoffman wrote an article
12%
07.03.2019
, simply run it as before. You monitor the GPU usage with the nvidia-smi
command. If you run this command in a loop, you can watch the GPU usage as the code runs. If the code runs quickly or if not much
12%
05.12.2018
interact with the system?
One of the first things I learned as a system administrator is always to have a CLI link to systems so I can edit configuration files, monitor the system, restart services, read
12%
05.11.2018
for starting, executing, and monitoring work (normally a parallel job) on the set of allocated nodes.”
“… it arbitrates contention for resources by managing a queue of pending work.”
These three points
14%
12.09.2018
and monitoring NFS filesystems is showmount
, which allows you to list the client name or IP address of the client and the mounted directory in host:dir
format. The command
showmount -e [host]
tells you what
12%
08.08.2018
, indeed.
The Author
Jeff Layton has been in the HPC business for almost 25 years (starting when he was 4 years old). He can be found lounging around at a nearby Frys enjoying the coffee and waiting
12%
08.07.2018
,
spot-monitoring the compute nodes, and
debugging.
This list is just the short version; the real list is extensive. Anything you want to do on a single node can be done on a large number of nodes
12%
25.01.2018
-based answers are always better than guesses or suppositions. What’s the best way to have data? Be a lumberjack and log everything.
Logging
Regardless of what you monitor, you need to be a lumberjack and log it